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Abstract 

Our article deals with Bayesian inference for a general state space model with the 
simulated likelihood computed by the particle filter. We show empirically that the 
partially or fully adapted particle filters can be much more efficient than the standard 
particle filter, especially when the signal to noise ratio is high. This is especially im- 
portant because using the particle filter within Markov chain Monte Carlo sampling 
is 0(T 2 ), where T is the sample size. We also show that an adaptive independent 
Metropolis Hastings proposal for the unknown parameters based on a mixture of nor- 
mals can be much more efficient than the usual optimal random walk methods because 
the simulated likelihood is not continuous in the parameters and the cost of constructing 
a good adaptive proposal is negligible compared to the cost of evaluating the simulated 
likelihood. Independent Metropolis-Hastings proposals are also attractive because they 
are easy to run in parallel on multiple processors. The article also shows that the 
proposed adaptive independent Metropolis Hastings sampler converges to the posterior 
distribution. We also show that the marginal likelihood of any state space model can 
be obtained in an efficient and unbiased manner by using the particle filter making 
model comparison straightforward. Obtaining the marginal likelihood is often difficult 
using other methods. Finally, we prove that the simulated likelihood obtained by the 
auxiliary particle filter is unbiased. This result is fundamental to using the particle 
filter for Markov chain Monte Carlo sampling and is first obtained in a more abstract 

* Corresponding author. 



and difficult setting by Del Moral (2004). However, our proof is direct and will make 
the result accessible to readers. 

Keywords: Auxiliary variables; Bayesian inference; Bridge sampling; Full and partial 
adaptation; Marginal likelihood. 



1 Introduction 



Our article builds on the work of Andrieu et al. (2010 ) and develops general simulation meth- 
ods that make Bayesian inference for time series state space models feasible and efficient. Our 
first contribution is to show empirically that partially or fully adapted auxiliary particle filters 



using the simulated likelihood defined in Pitt (2002) can be much more efficient statistically 



than the standard standard particle filter of Gordon et al. (1993), especially when the signal 



to noise ratio is high, because they reduce the noise in the simulated likelihood. It is very 
important to carry out the particle filter as efficiently as possible, because particle filtering 
within Markov chain Monte Carlo sampling is 0(T 2 ), where T is the sample size as explained 
in Section I2TT1 

Adaptive sampling methods are simulation methods for carrying out Bayesian inference 
that use previous iterates of the simulation to form proposal distributions, that is, the adap- 
tive samplers learn about some aspects of the posterior distribution from previous iterates. 



See for example Haario et al. (2001), Atchade and Rosenthal (2005) and Roberts and Rosen- 



thal (2009) who consider adaptive random walk Metropolis proposals and Giordani and Kohn 



(2010) who base their proposal on a mixture of normals. 



The second contribution of the article is to show that when working with a simulated 
likelihood it is worthwhile constructing adaptive independent Metropolis Hastings proposals 
that provide good approximations to the posterior density. The first reason for this claim 
is that the simulated likelihood is not continuous in the unknown parameters. This means 
that standard methods for constructing proposals such as Laplace approximations based on 
analytic or numerical derivatives are usually infeasible. It also means that the usual optimal 
random walk methods do not perform as well as expected as the probability of acceptance 
does not tend to 1 as a proposed move becomes more local moves or even if the parameter 
does not change at all. The second reason is that the cost of constructing a good adaptive 
proposal is often negligible compared to the cost of running the particle filter to obtain the 
simulated likelihood. Third, an adaptive sampling scheme that consists entirely or mostly of 
independent Metropolis-Hastings steps is attractive because a large part of the computation 
can be run in parallel thus substantially reducing computing time. 



Our article uses the adaptive independent Metropolis Hastings sampler of Giordani and 



Kohn (2010 ) which approximates the posterior density by a mixture of normals. We show that 



this proposal density can be much more efficient than the adaptive random walk Metropolis 



proposal of Roberts and Rosenthal (2009) for the reasons just outlined. We also show that 
this adaptive sampler converges to the correct posterior distribution. We note, however, that 



in our experience, the adaptive random walk Metropolis algorithm of Roberts and Rosenthal 



(2009) is important because it converges reliably for a diverse set of problems and provides 



a good way to initialize other more efficient adaptive sampling schemes. 

The third contribution of our article is to show that the marginal likelihood of any state 
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space model can be estimated in an efficient and unbiased manner by combining particle 
filtering with bridge or importance sampling. This makes it straightforward to compare the 
marginal likelihoods of two or more models each of which can be expressed in state space 
form. 

The final contribution of the article is to show that the simulated likelihood obtained 
the auxiliary particle filter is unbiased. This result is obtained in an abstract setting in 
Proposition 7.4.1 in Section 7.4.2 inlDel Morall (120041). lAndrieu et all (120101) show that the 



unbiasedness of the simulated likelihood allows Bayesian inference using Markov chain Monte 
Carlo simulation. This is because the simulated likelihood can viewed as the density of the 
observations conditional on the unknown parameters and a set of auxiliary latent variables. 
We believe that our derivation of unbiasedness is more direct and accessible than that of 



Del Moral (2004). 



Computational algorithms for state space models such as the Kalman filter and particle 
filter are useful because many time series models can be expressed in state space form. Com- 
putational methods for Bayesian inference for Gaussian state space model are well developed 
(see Cappe et al. , 2005 ) and there is a literature now on Bayesian computational methods for 
non-Gaussian state space models. Markov chain Monte Carlo computational methods based 
on the particle filter have the potential to greatly increase the number and complexity of time 
series models amenable to Bayesian analysis. An early of the particle filter within an Markov 
chain Monte Carlo framework is by Fernandez- Villaverde and Rub io- Ramirez (2007) who 
applied it to macroeconomic models as an approximate approach for obtaining the posterior 
distribution of the parameters. 



Particle filtering (also known as sequential Monte Carlo) was proposed by Gordon et al. 



(1993) for online filtering and prediction of nonlinear or non-Gaussian state space models. 
The auxiliary particle filter method was introduced by Pitt and Shephard (1999) to improve 
the performance of the standard particle filter when the observation equation is informative 
relative to the state equations, that is when the signal to noise ratio is moderate to high. 
There is an extensive literature on online filtering using the particle filter, see for example 



Kitagawa (1996), Liu and Chen (1998), Doucet et al. (2000), Doucet et al. (2001), Andrieu 



and Doucet (2002), Fearnhead and Clifford (2003) and Del Moral et al. (2006). Our article 



considers only the standard particle filter of Gordon et al. ( 1993 ) and the fully and partially 
adapted particle filters proposed by Pitt and Shephard (1999). 

The literature on using the particle filter to learn about model parameters is more limited. 
Pitt (2002) proposes the smooth particle filter to estimate the parameters of a state space 



using maximum likelihood. Storvik (2002) and Poison et al. (2008) consider online parameter 
learning when sufficient statistics are available. Andrieu et al. (2010) provide a framework 
for off-line parameter learning using the particle filter. Flury and Shephard (2008) give an 
insightful discussion of the results of Andrieu et al. (2010) and use single parameter random 
walk proposals for off-line Bayesian inference. 



Our article is an updated version of Silva et al. (2009), which contains some extra exam- 
ples. 
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2 State space models 



Consider a state space model with observation equation p(y t \x t ; 9) and state transition equa- 
tion p(x t \x t -i] 9) , where y t and x t are the observation and the state at time t and 9 is a 



vector of unknown parameters. The distribution of the initial state is p(x Q \9). See Cappe 



et al. (2005) for a modern treatment of general state space models. The filtering equations 



for the state space model (for t > 1) are (West and Harrison, 1997, pp. 506-507) 



p{x t \yi:t-i; 9) 
p(x t \yi: t ;9) 
p(yt\yi:t-i, 9) 



p{x t \x t -x] 9)p(x t -i\yi:t-i] 9)dx t _ 1} 

p(yt\x t ;9)p(x t \yi:t-i;9) 
p(y t \yi:t-i, 9) 

p(y t \x t ; 9)p(x t \yi :t -i; 9)dx t . 



(la) 
(lb) 
(lcl 



where y\- t = {yi, . . . ,yt}- Equations (la)-(lc) allow (in principle) for filtering for a given 9 
and for evaluating the likelihood of the observations y = yi-x, 



T-l 



P(y\9) =p(yi\0)Y[p(y 



t+l\yi:t,' 



(2) 



If the likelihood p(y\9) can be computed, maximum likelihood and MCMC methods can be 
used to carry out inference on the parameters 9, with the states integrated out. When both 
the observation and state transition equations are linear and Gaussian the likelihood can be 



evaluated analytically using the Kalman filter (Cappe et al. , 2005, pp. 141-143). More general 
state space models can also be estimated by MCMC methods if auxiliary latent variables are 



introduced, e.g. Kim et al. (1998) and Fruhwirth-Schnatter and Wagner (2006) and/or the 



states are sampled in blocks as in Shephard and Pitt (1997). See section 6.3 of 



(2005) for a review of Markov chain Monte Carlo methods applied to genera 



Cappe et al. 



state space 



models. 



In general, however, the integrals in equations (la)-(lc) are computationally intractable 



and the standard particle filter algorithm (SIR) was proposed by Gordon et al. (1993) as 



a method for approximating them with the approximation becoming exact as the number 



of particles tends to infinity. Pitt and Shephard (1999) propose the auxiliary particle filter 



method (ASIR) which is more efficient than standard particle filter when the observation 
density is informative relative to the transition density. The general auxiliary particle filter 
is described in Section |2~T1 



2.1 General ASIR method 



The general auxiliary SIR (ASIR) filter of Pitt and Shephard ( 1999 ) may be thought of as a 
generalisation of the SIR method of Gordon et al. (1993). We therefore focus on this, more 
general, approach. To simplify notation in this section, we omit to show dependence on the 
unknown parameter vector 9. The following algorithm describes the one time step ASIR 



update and is initialized with samples Xq ~ p(xo) with mass 1/M for k 



M. 
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Algorithm 1. Given samples x\ ~ p{xt\yi-.t) with mass n k for k = 1, M. 
For t = 0,..,T - 1 : 



1. For k — 1 : M, compute ui k 



t\t+i 



9{.yt+i\x k t )n , 



71 



t\t+l yM w i 



2. For k — 1 : M, sample x t ~ ^ i=1 ^tit+i^O^ — x t)- 
5. For = 1 : M, sample x^ +1 ~ ^(^i+ilx^; yt+i)- 
4- For k — 1 : M, compute 



UJ. 



t+i 



p{yt+i 


x k t+1 )p(x 


k 

t+i 


x k t ) 


g(yt+i 


X 




~k . 

x t j 


yt+i) 



UJ 



7i f 



t+1 



EM 



t+1 



Note that in Step 2, 5(x — a) is the delta function with unit mass at x = a. In addition, 
in Step 2, multinomial sampling may be employed but stratified sampling is generally to be 



and Shephard (2001). 



preferred and is employed throughout, see Kitagawa (1996), Carpenter et al. (1999) and Pitt 



Note that the true joint density may be written as, 

p(y t+1 \x t+1 )p(x t+1 \x t ) = p(y t+1 \x t )p(x t+1 \x t ; y t+1 ) 

where 



p(y t+1 \x t ) = J p(y t+ i\x t+ i)p(x t+ i\x t )dx t+1 , 
p{x t+ i \x t ; y t+1 ) = p(y t+1 \x t+1 )p(x t+1 \x t )/p(y t+1 \x t ). 

Typically this fully adapted form is unavailable but when it is the approximating joint density 
may be chosen to be the true joint. That is, 

g(y t+1 \x t )g(x t+1 \x t ; y t+1 ) = p(y t+1 \x t )p(x t+1 \x t ; y t+1 ). 



In this case Step 4 becomes redundant as u k +l 



, ( 7rf +1 = 1/M) and the method reduces to 
what |Pitt and Shephard| (120011) call the fully adapted algorithm. The fully adapted method 
is the most efficient in estimating the likelihood and is generally the optimal filter a single 
time step ahead. 



The SIR method of Gordon et al. (1993) arises when the joint proposal is chosen as, 

g{y t +i\x t ) x g{x t+1 \x t ) = 1 x p(x t+1 \x t ), 
in which case, g(y t+ i\x t ) is constant and g(x t +i\x t ; yt+i) = v( x t+i\ x t)- In this case, step (1) 



above leaves the weights unchanged (as 7r£j t+1 = irf) 



The goal of the auxiliary particle filter is to get as close to full adaption as possible, when 
full adaption is not analytically possible. This is achieved by making g(y t+ i\x t ) as close to 
p(yt+i\xt) as a function of x t as possible (up to a constant of proportionality) and the density 



5 



g(xt+i\x t ; Vt+i) as close to p(x t +i\x t ] yt+i) as possible. Various procedures are found for doing 



this; see for example, Pitt and Shephard (2001) and Smith and Santos (2006). 



The general ASIR estimator of p(y t \yut-i) , introduced and used by 



Pitt (2002), is 



p A (yt\yi-.t-i) 




(3) 



are defined above and calculated as part of the ASIR 



The two sets of weights uj\ and oj\_ x ^ t 
algorithm. These two quantities are again a simple by-product of the algorithm. We define 
the information in the swarm of particles at time t as At = {x^; irf}. For full adaption uj\ = 1 
and oj^_^ t = p(y t \x^_ 1 ) / M and the first summation in ^ disappears. For the SIR method, 
uj\ = and w*_i[t — n t-i and the second summation in (3) disappears. 

The ASIR Algorithm [I] is a flexible particle filter approacn when combined with strat- 
ification. Theorem [I] establishes that this algorithm together with the estimator of ^ is 
unbiased. This is important as it enables very efficient likelihood estimators from the ASIR 
method to be used within an MCMC algorithm. 

Theorem 1. The ASIR likelihood 

P A (yi-.t 

is unbiased in the sense that 



V 



\yi)Y[p A (yt\yi:t-i] 



(4) 



t=2 



E{p A {yi-.t)) = p{y 



The theorem is proved in Section 7.4.2, Proposition 7.4.1 of Del Moral (2004). We give a 
more direct and accessible proof in Appendix [Aj 

Our examples use the standard particle filter and the fully adapted particle filter, and the 



partially adapted particle filter described in Appendix B.l 



Simulated Likelihood The ASIR likelihood estimate H] is called the simulated likeli- 
hood and Theorem [T] shows that the general ASIR particle filter provides a simulated likeli- 
hood that is an unbiased estimate of the true likelihood function. Andrieu et al. (2010) show 



that we can view the simulated likelihood p(y\0) as the density of y conditional on 6 and a 
set of auxiliary variables u that are not a function of 9 and such that p{y\0) = ps{y\0,u), 
where the subscript S denotes a simulated likelihood, and 



p s (y\9,u)p(u)du = p{y\9). 



(5) 



The variables u represent the uniform variates used for the multinomial/statified draws and 
the random variates (e.g. standard Gaussian) used in simulating from g(x t +x\x t , yt+i)- It 
follows that the posterior ps(0\y) = p{6\y) so that a method that simulates from ps(0,u\y) 
yields iterates from the correct posterior p{6\y). We note that the ideas of using a simulated 
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likelihood for Bayesian inference have also been explored, outside the area of particle filters 
in the work of Beaumont (2003) and Andrieu and Roberts (2009) 



We note that the variance of the log of the simulated likelihood is 0(T/M) so it will be 
necessary to take the number of particles M = 0(T) to keep a constant standard deviation 
as T increases. This implies that the particle filter MCMC algorithm is of order 0(T 2 ) for 
T large and means that it is important to make the particle filter as efficient as possible. 



3 Adaptive sampling for the simulated likelihood 

The target density for posterior inference is ps{9, u\y) oc ps(y\9, u)p{9)p{u) , where p{6) is the 
prior for 9. It may therefore be possible to use a Metropolis-Hastings simulation method 
to generate samples from the target density as follows. Suppose that given some initial 9q, 
the j — 1 iterates (9i, ui), . . . , (8j-i, Uj-%) have been generated. The jth iterate, (9j,Uj), 
is generated from the proposal density qj(9;9)p(u), which may also depend on some other 
value of 9 which we call 9. Let (9j,Uj) be the proposed value of (9j,Uj) generated from 
qj(9; 9j_i)p{u). Then we take 



'31 u j/ 



a 



u p ) 



mm 



P , u p ) with probability 

pg(y|fl?,tffip(ff) qjj^l 
Ps(y\8j-i,u j . l )p(9 j ^ 1 )q j (9 p j ;9 j . 



(6) 



1,^-1 



) otherwise. We say that 



withp(w^) and p(uj-i) cancelling out, and take 
the proposal is independent if qj(9;9) = qj(9). 

In adaptive sampling the parameters of qj(9; 1 
When the likelihood can be measured exactly, i.e., in the non-particle filter case, then it can 
be shown that under appropriate regularity conditions, the sequence of iterates 9j,j > 1, 
converges to draws from the target distribution 



are estimated from the iterates 9i, . . . ,9 



3-2- 



Sec 



and Rosenthal (2009) and Giordani and Kohn (2010). 



Roberts and Rosenthal (2007), Roberts 



Our article uses the adaptive independent Metropolis Hastings scheme of Giordani and 



Kohn 



(2010) and the adaptive random walk Metropolis scheme of Roberts and Rosenthal 



(2009). They are discussed in Appendix C Appendix C.3 proves that the adaptive inde- 



pendent Metropolis Hastings sampler converges to the target distribution under the given 
conditions. Both the standard particle filter and the fully adapted particle filter satisfy these 
conditions almost automatically. The appendix also shows that in the partially adapted case 
a simple mixture of a partially adapted particle filter and the standard particle filter also 
satisfies the conditions for convergence. 



3.1 Adaptive sampling and parallel computation 

Carrying out Bayesian inference using the particle filter and Markov chain Monte Carlo 
simulation is computationally expensive. However, parallel processing can greatly increase 
the speed and therefore the areas of application of our methods. For adaptive independent 
Metropolis Hastings proposals we can use the following three step approach. Let 9 C the cur- 
rent value of 9 generated by the sampling scheme and q c {9) the current proposal density for 
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9. (a) For each of J processors generate K proposed values of 9, which we write as 9jl, k = 

1, . . . ,K, and compute the corresponding logs of the ratios p(y\9 ( fl)p(9^)/q(9^ P l). (b) Af- 
ter each K block of proposed values is generated for each processor, carry out Metropolis- 
Hastings selection of the JK proposed {9^} parameters using a single processor to obtain 
{9j } k} draws from the chain. This last step is fast because it is only necessary to draw uniform 
variates. (c) Use the previous iterates and the 9j t k to update the proposal density q c {9) and 
9 C . In our applications of this approach, K is chosen so that K J is aproximately the time 
between updates of the adaptive independent Metropolis Hastings sampling scheme. 

A second approach applies to all Metropolis-Hastings sampling schemes, and in particular 
to the adaptive random walk Metropolis proposal. Suppose that J processors are available. 
The likelihood is estimated for a given 9 on each of the processors using the particle filter 
with M particles and these estimates are then averaged to get an estimate of the likelihood 
based on JM particles. This approach is similar to, but faster, than using a single processor 
and makes it possible to estimate the likelihood using a large number of particles. However, 
for a given number of generated particles, the first approach can be shown to be statistically 
more efficient than the second. 



3.2 Estimating the marginal likelihood 



Marginal likelihoods are often used to compare two or more models. For a given model, let 
9 be the vector of model parameters, p(y\9) the likelihood of the observations y and p(9) the 
prior for 9. The marginal likelihood is defined as 



P(y) = / p{y\9)p{9)d9. 



(7) 



which in our case can also be written as 



P{y) = J Ps{y\9,u)p(9)p(u)d9 du . (8) 

It is often difficult to evaluate or estimate p(y) in non-Gaussian state space models, although 
auxiliary variable methods can be used in some problems. See Fruhwirth-Schnatter and 



Wagner (2008). Appendix D briefly outlines how the marginal likelihood can be estimated 



using bridge or importance sampling, with the computation carried out within the adaptive 
sampling framework so that a separate simulation run is unnecessary. 



4 Comparing the standard SIR particle filter with adapted 
ASIR particle filters 

It is instructive to compare the performance of the standard particle filter with the fully 
and partially adapted particle filters for different signal to noise ratios and using different 
numbers of particles. We use two simulated examples. The first example compares the 
standard particle filter and the fully adapted particle filter for a Gaussian autoregressive 
signal observed with Gaussian noise. This example is also of interest because we can compute 
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the exact likelihood using the Kalman filter, which is equivalent to using an infinite number 
of particles. The second example compares the standard SIR to a partially adapted particle 
filter using a a binomial model where we vary the signal to noise ratio by varying the number 
of binomial trials. 

In both examples, we make the comparison in terms of three criteria. The first is the 
acceptance rate of the adaptive independent Metropolis Hastings sampler, which we define 
as the percentage of accepted draws. The second is the inefficiencies of the iterates of the 



parameters obtained using the adaptive independent Metropolis Hastings method of Gior- 



dani and Kohn (2010). The third is the standard deviation of the simulated log-likelihood 
Ps(y\0, u) evaluated at the true value of 9, which is a good measure of how close the particle 
filter likelihood is to the true likelihood p(y\6). 

We define the inefficiency of the sampling scheme for a given parameter as the variance of 
the parameter estimate divided by its variance if the sampling scheme generates independent 
iterates. We estimate the inefficiency factor for a given parameter as IF = 1 + 2^2- x pj, 
where pj is the estimated autocorrelation of the parameter iterates at lag j. As a rule of 
thumb, the maximum number of lags L* that we use is L* = min{1000, L} , where L is the 
lowest index j such that \pj\ < 2/VK where K is the sample size used to compute pj. 

4.1 Example 1: Autogressive model observed with noise 

Consider the following first order autoregression (AR(1)) plus noise model, 

y t \x t ~ J\f(x t ,cr 2 ) 
x t+ i\x t ~ Af(p + <f>(x t - p),r 2 ) (9) 
x o ~A%,t 2 /(1-0 2 )). (10) 

The prior distributions are p ~ jV(0, 100), <p ~ U(0, 1), a 2 ~ ZQ (0.1, 0.1) and r 2 ~ Z(?(0.1, 0.1). 
The notation A/"(a, b 2 ) means a normal distribution with mean a and variance b 2 , U(a, b) 
means a uniform distribution on (a, b) and ZQ(a,b) means an inverse gamma distribution 
with shape parameter a and scale b. 

Our simulation study uses 50 replicated data sets with 500 observations each, generated 
by setting p = 0, = 0.6, r 2 = 1, x ~ N(p, r 2 /(l — 2 )) and two values for a 2 = {0.01, 1.0}, 
corresponding to high and low signal to noise ratios. We ran 30 000 iterations of the adaptive 
independent Metropolis-Hastings for the posterior distribution using the standard particle 
filter and the fully adapted particle filter with differing number of particles. For completeness, 
we also ran the adaptive sampling scheme with the Kalman filter using the exact likelihood. 
The update times for the adaptive independent Metropolis Hastings were at iterations 100, 
200, 500, 1000, 1500, 2000, 3000, 4000, 5000, 10000, 15000 and 20000. We initialized the 
AIMH based on a normal proposal formed from 5000 draws of a previous run of the ARWM 
and the Kalman filter. For each signal to noise ratio we report the median MCMC parameter 
inefficiencies over the 50 replications as well as the interquartile range of the inefficiencies 
for differing numbers of particles. We also report results on the standard deviation of the 
simulated log likelihood for the standard particle filter and the fully adapted particle filter. 
Specifically, for each of the 50 replicated data sets we computed the log likelihood at the true 
parameter values 1000 times for each of the two particle filters and obtained the median and 



9 



the interquartile range of the medians and standard deviations of the log likelihoods across 
the 50 data replicates. 

Results for the high signal to noise case Tables [T] and [2] report the results for the 
high signal to noise case with a 2 = 0.01. Table [l] shows that the median variance of the 
simulated log likelihood log ps(y\9, u) at the true parameter values for the standard particle 
filter with 2000 particles is over 400 times higher than the median variance of the fully adapted 
particle filter using 100 particles. This suggests that to get the same standard deviation for 
the simulated likelihood we would need approximately 8000 times as many particles for the 
standard particle filter as for the fully adapted particle filter as we know that the variance 
decreases approximately in inverse proportion to the number of particles. 

Table [2] shows that the median acceptance rate of the adaptive independent Metropolis 
Hastings sampler using the fully adapted particle filter with 100 particles is about 1.75 times 
higher than the median acceptance rate of the standard particle filter using 4000 particles. 
The table also shows that the median parameter inefficiencies are about 3 times higher for 
the standard particle filter using 4000 particles than for the fully adapted particle filter using 
100 particles. Finally, the table shows that the fully adapted particle filter using 500 particles 
performs almost as well as using the exact likelihood. 

Table 1: AR(1) + noise. High signal to noise. Medians and interquantile ranges (IQR) of the 
estimated medians and standard deviations of the log of the simulated likelihood function at 
the true value for 50 different data sets. 



N. Particles 


Median 


Standard Deviation 


Median IQR 


Median IQR 




Standard 


Particle Filter 


100 


-839.12 83.34 


44.0381 21.0316 


500 


-729.43 26.09 


10.5420 8.2875 


1000 


-719.10 20.13 


5.5507 5.2818 


2000 


-714.95 18.16 


2.8977 2.4716 




Fully Adapted Particle Filter 


100 


-711.69 17.74 


0.1431 0.0160 



Results for the low signal to noise case Tables [3] and [4] report the results for the low 
signal to noise case with a 2 = 1.0. Table [3] shows that the median variance of the log of the 
simulated likelihood at the true parameter values for the standard particle filter using 1000 
particles is about the same as the median variance of the simulated likelihood for the fully 
adapted particle filter using 100 particles, i.e. the variance of the simulated log likelihood of 
the standard particle filter is about 10 times that of the fully adapted particle filter for the 
same number of particles. 

Table [4] shows that the median acceptance rates and parameter inefficiencies of the adap- 
tive independent Metropolis Hastings sampler using the fully adapted particle filter with 100 
particles are about the those of the standard particle filter using 4000 particles. 
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Table 2: AR(1) + noise. High signal to noise. Medians and interquartile range (IQR) of the 
acceptance rates and the inefficiencies over 50 replications of the autoregressive model using 
different particle filters and adaptive independent Metropolis-Hastings. 



N. Particles 


Ac. Rate 


a 2 


T 2 


A 1 


4> 


Median IQR 


Median IQR 


Median IQR 


Median IQR 


Median IQR 




Kalman Filter 




72.18 4.89 


1.93 0.34 


1.85 0.35 


1.76 0.20 


1.83 0.24 




Standard Particle Filter 


500 
1000 
2000 
4000 


0.05 0.78 
9.04 5.75 
21.81 10.87 
33.27 9.19 


836.65 1727.31 
70.25 35.10 
21.84 20.64 
9.33 6.33 


1102.87 1732.95 
63.76 30.98 
22.97 17.62 
9.11 7.20 


1058.80 1733.35 
59.76 58.41 
20.48 23.16 
9.66 6.58 


979.64 1729.08 
64.64 42.55 
25.09 24.07 
8.95 8.13 




Fully Adapted Particle Filter 


100 
500 


58.83 3.04 
67.64 2.30 


3.02 0.56 
2.23 0.31 


2.91 0.66 
2.08 0.31 


2.63 0.43 
2.02 0.20 


2.80 0.45 
2.10 0.24 



Table 3: AR(1) + noise. Low signal to noise. Medians and interquantile ranges (IQR) of the 
estimated medians and standard deviations of the log of the simulated likelihood function at 
the true value for 5 different data sets. 



N. Particles 


Median 


Standard Deviation 


Median IQR 


Median IQR 




Standard Particle Filter 


100 


-904.0827 19.0013 


2.4479 0.2372 


500 


-901.7877 18.8020 


1.0793 0.1170 


1000 


-901.4966 18.7909 


0.7629 0.0550 




Fully Adapted Particle Filter 


100 


-901.4727 18.8540 


0.7057 0.0398 



Table 4: AR(1) + noise. Low signal to noise. Medians and interquartile range (IQR) of the 
acceptance rates and the inefficiencies over 50 replications of the autoregressive model using 
different particle filters and adaptive independent Metropolis-Hastings. 



N. Particles 


Ac. Rate 


a 2 


T 2 






Median IQR 


Median IQR 


Median IQR 


Median IQR 


Median IQR 




Kalman Filter 




73.81 2.26 


2.04 0.38 


2.12 0.46 


1.73 0.12 


1.94 0.25 




Standard Particle Filter 


500 
1000 
2000 
4000 


6.84 6.28 
29.86 18.15 
42.47 13.37 
52.95 11.85 


92.85 76.67 
19.67 27.30 
11.87 11.20 
6.38 6.86 


95.95 71.78 
21.10 24.91 
10.72 8.98 
6.16 4.53 


71.63 51.26 
10.87 16.07 
5.36 3.62 
3.28 2.54 


85.93 68.60 
18.18 28.90 
9.02 6.88 
5.02 4.11 




Fully Adapted Particle Filter 


100 


53.94 9.24 


3.48 0.88 


3.53 0.84 


3.27 1.32 


3.62 0.92 
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4.2 Example 2: Binomial model with an autoregressive state equa- 
tion 



Consider observations generated from the following dynamic binomial model 

y t ~ Bin(m, n t ) , 7T t = exp(x t )/ (1 + exp(x t )) 

where m is the number of trials and 7r t is the probability of success of each trial. The 
states Xt follow the first order AR(1) model ^ whose initial distribution is (10). The prior 



distributions for the parameters are /i ~ A/"(0, 100), (ft ~ U(0, 1) and r 2 ~ "HjV(lOO). We use 
the notation 1-iM{b 2 ) to mean a half-normal distribution with scale h. 



Our simulation study is organized similarly to that in Section 4.1 The data generating 
process takes /i = 0, = 0.97, r 2 = 0.25 and the number of trials takes the two values 
m = {100,500}. We initialized the AIMH based on a normal proposal formed from 5000 
draws of a previous run of the ARWM and the partially adapted particle filter using 500 
particles. 

We note the following about the binomial density. 

1. The standard particle filter will do worse as the number of trials m increases because 
the the measurement density becomes more informative and peaked so the variance of 
the weights increases. 

2. The opposite is true for the partially adapted particle filter method. The measurement 
density p{yt\xt) tends to normality as m increases by the cental limit theorem, so that 
the partially adapted particle filter tends to a fully adapted particle filter. Hence the 
partially adapted particle filter method actually improves as m becomes larger and this 
is seen in Tables [5] and [7] below. 



High signal to noise case Tables [5] and [6] report the results for the high signal to 
noise case, with the number of trials set at m = 500. Table [5] shows that the variance of 
the simulated log likelihood at the true parameter values for the standard particle filter with 
4000 particles is about 2.5 times higher than that of the partially adapted particle filter when 
100 particles are used. This means that it is necessary to have 100 times as many particles 
using the standard particle filter to get the same noise level for the simulated log likelihood 
as for the partially adapted particle filter. 

Table [6] shows that the median acceptance rate of the adaptive independent Metropolis 
Hastings sampler using the partially adapted particle filter with 100 particles is about 1.3 
times higher than the median acceptance rate of standard particle filter using 4000 particles. 
The table also shows that the median parameter inefficiencies are 1.5 times higher for the 
standard particle filter using 4000 particles than they are for the partially adapted particle 
filter using 100 particles. 



Low signal to noise case Tables [7] and [8] report the results for the low signal to noise 
case, with the number of trials set at m = 100. Table [7] shows that the median variance 
of simulated log likelihood at the true parameter values for the standard particle filter with 
4000 particles is about the same as that obtained by the partially adapted particle filter using 
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Table 5: Binomial example. High signal to noise. Medians and interquantile ranges (IQR) 
of the estimated medians and standard deviations of the log-likelihood function at the true 
value for 50 different data sets. 



N. Particles 


Median 


Standard Deviation 


Median IQR 


Median 


IQR 




Standard Particle Filter 


500 


-2441.29 114.63 


3.3149 


1.5044 


1000 


-2439.64 114.39 


2.0737 


0.8711 


2000 


-2438.88 117.55 


1.4106 


0.3203 


4000 


-2438.45 118.61 


0.9478 


0.1593 




Partially Adapted Particle Filter 


100 


-2438.28 119.38 


0.6182 


0.1358 



Table 6: Binomial example. High signal to noise. Medians and interquartile range (IQR) of 
the acceptance rates and the inefficiencies over 50 replications of the binomial model using 
different particle filters and adaptive independent Metropolis-Hastings. 



N. Particles 


Ac. Rate 


A 4 


logitO) 


log(r 2 ) 


Median IQR 


Median IQR 


Median IQR 


Median IQR 




Standard Particle Filter 


500 
1000 
2000 
4000 


3.66 2.57 
13.28 4.13 
27.18 5.54 
39.73 5.85 


92.85 99.02 
39.84 47.49 
12.26 7.83 
6.01 3.86 


100.32 369.21 
38.77 32.71 
12.15 7.61 
5.93 3.00 


105.76 330.84 
40.89 23.31 
13.29 10.00 
5.82 2.14 




Partially Adapted Particle Filter 


100 


51.52 11.17 


3.82 3.06 


3.74 2.09 


3.39 1.35 



250 particles, i.e. the standard particle filter requires about 16 times as many particles to 
obtain the same standard deviation as the partially adapted particle filter. 

Table [8] shows that the median acceptance rate of the adaptive independent Metropolis 
Hastings sampler using the partially adapted particle filter with 200 particles is higher than 
the median acceptance rate of standard particle filter using 2000 particles. The table also 
shows that the median parameter inefficiencies for the standard particle filter using 2000 
particles are higher than the median inefficiencies for the partially adapted particle filter 
using 200 particles. 



5 Performance of the adaptive sampling schemes on 
real examples 

This section uses real data to illustrate the flexibility and wide applicability of the approach 
that combines particle filtering with adaptive sampling. All that is necessary for model esti- 
mation and model comparison by marginal likelihood is to code up a particle filter to evaluate 
the simulated likelihood and to code up the prior on the parameters. We also illustrate the 
difference in performance between the adaptive random walk Metropolis sampling scheme 



of Roberts and Rosenthal (2009) and that the adaptive independent Metropolis Hastings 



scheme of Giordani and Kohn (2010). This comparison is interesting for two reasons. First, 
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Table 7: Binomial example. Low signal to noise. Medians and interquantile ranges (IQR) 
of the estimated medians and standard deviations of the log-likelihood function at the true 
value for 50 different data sets. 



N. Particles 


Median 


Standard Deviation 


Median IQR 


Median 


IQR 




Standard Particle Filter 


500 


-1729.05 109.49 


1.8385 


0.2960 


1000 


-1728.16 109.63 


1.2704 


0.2457 


2000 


-1727.69 109.72 


0.8827 


0.1458 


4000 


-1727.50 109.82 


0.6300 


0.0711 




Partially Adapted Particle Filter 


100 


-1727.81 109.84 


0.9867 


0.1074 


200 


-1727.55 109.85 


0.7132 


0.0810 


500 


-1727.41 109.90 


0.4465 


0.0550 



Table 8: Binomial example. Low signal to noise. Medians and interquartile range (IQR) of 
the acceptance rates and the inefficiencies over 50 replications of the binomial model using 
different particle filters and adaptive independent Metropolis-Hastings. 



N. Particles 


Ac. Rate 




logitO) 


log(r^) 


Median IQR 


Median IQR 


Median IQR 


Median IQR 




Standard Particle Filter 


500 
1000 
2000 
4000 


17.02 4.82 

31.01 5.46 
43.47 5.91 

54.02 4.91 


29.37 17.65 
10.85 7.54 
5.27 3.30 
3.70 3.35 


31.85 16.33 
10.51 7.14 
5.17 2.10 
3.48 1.94 


31.57 15.90 
10.33 4.73 
5.00 1.79 
3.03 0.58 




Partially Adapted Particle Filter 


100 
200 
500 


39.16 7.15 
50.59 5.19 
60.80 4.19 


5.06 3.60 
3.62 2.26 
2.80 3.32 


5.99 2.63 
3.81 2.02 
2.47 1.23 


5.96 2.50 
3.59 1.21 
2.32 0.32 



the adaptive independent Metropolis Hastings scheme tries to obtain a good approximation 
to the posterior density, whereas the adaptive random walk Metropolis aims for some target 
acceptance rate. Second, we claim that any independent Metropolis Hastings scheme (of 
which the adaptive independent Metropolis Hastings scheme of Giordani and Kohn (2010) 
is an example), is more suitable to be implemented in parallel than a Metropolis-Hastings 
scheme with a proposal that depends on the previous iterate (of which the adaptive random 
walk Metropolis scheme of Roberts and Rosenthal (2009) is an example). 

The comparison between the two schemes is in terms of three criteria. The first two are 
the acceptance rate of the Metropolis-Hastings method and the inefficiency factors (IF) of 
the parameters and are defined in Section |4j they are independent of the way the algorithms 
are implemented. However, these two criteria do not take into account the times taken by 
the samplers. To obtain an overall measure of the effectiveness of a sampler, we define its 
equivalent computing time ECT = 1000 x IF x t, where t is the time per iteration of the 
sampler. We interpret ECT as the time taken by the sampler to attain the same accuracy 
as that attained by 1000 independent draws of the same sampler. For two samplers a and b, 
ECT a /ECTb is the ratio of times taken by them to achieve the same accuracy. We note that 
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the time per iteration for a given sampling algorithm depends on how it is implemented, i.e. 
the language used, whether operations are vectorized, etc. 

The results presented are for a single processor and the two parallel methods discussed in 
Section 3A_ To simplify the presentation, we mainly present results for the standard particle 
filter. 



5.1 Example 1: Stochastic volatility model with leverage and out- 
liers 

The first example considers the univariate stochastic volatility (SV) model 

y t = K t exp(x t /2)e t , e t ~ jV(0, 1) 

x t+l = fi + <f>(x t - y) + cr v Vt, Vt ~ A/"(0, 1) 

where corr(e t , rj t ) = p, Pr(K t = 2.5) = u and ~Pv(K t = 1) = 1 — u, with u> << 1. This is a 
state space model with a non-Gaussian observation equation and a Gaussian state transition 
equation for the latent volatility x% which follows a first order autoregressive model. The SV 
model allows for leverage because the errors in the observation and state transition equations 
can be correlated. The model also allows for outliers in the observation equation because 
the standard deviation of y t given x t can be 2.5 its usual size when K t = 2.5. To complete 
the model specification, we assume that all parameters are independent a priori with the 
following prior distributions: fi ~ A/"(0, 10 2 ), ~ W (0 ,i)(0.9, 0.1), a 2 ~ 10(0.01,0.01), 
and p ~ 7W(_i ) i)(0, 10 6 ). We use the notation TAf( C! d)(a,b) to mean a truncated normal 
with location a and scale b restricted to the interval (c, d) and IQ(a, b) is an inverse gamma 
distribution with shape parameter a, scale parameter b and mode bj [a + 1). We set u> = 0.03 
in the general model to indicate that outliers are rare apriori. 





Shephard 


2005 


and Pitt 


(200^ 


J) by 



Shephard (2005) reviews SV models and a model of the form (11) is estimated by Malik 



S&P 500 index We apply the SV model (jllp to the Standard and Poors (S&P) 500 
data from 02/Jan/1970 to 14/Dec/1973 obtained from Yahoo Finance web site^J The data 
consists of T = 1 000 observations. 

Table [9] shows the acceptance rates, the inefficiencies and the equivalent computing time 
over 10 replications of the stochastic volatility model using thestandard particle filter and 
the two adaptive Metropolis-Hastings schemes. The analysis uses the SV model without 
leverage or outliers. In the table, SP stands for a single processor, MPi for multiprocessor 
method 1 and MP 2 for multiprocessor method 2 (where the simulated likelihood is obtained 



as an average) described in Section 3.1 We use eight processors for both the MPx and MP 2 
schemes. The basic number of particles in this example is K = 500, which means that SP 
uses 4000 particles in a single processor, MPi uses 4000 particles in each processor and MP 2 
uses 500 particles in each processor. We ran all the algorithm for 10000 iterations and took 
the last 5000 to compute the results. The equivalent computing time is obtained by taking 
the overall time divided by the number of iterations times the inefficiency times 1000. The 



1 http://au.finance.yahoo.com/q/hp?s="GSPC 
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update times for the adaptive independent Metropolis Hastings using SP or MP2 were at 
100, 200, 500, 1000, 2000, 3000, 4000, 5000, 6000 and 7500. The block sizes (also the update 
times) for the adaptive Metropolis-Hastings MPi were 15, 25, 60, 125, 250, 375, 500, 625, 
750 and 940. 

The table shows that the acceptance rates of the adaptive independent Metropolis Hast- 
ings sampler are about twice those of the adaptive random walk Metropolis and the inef- 
ficiencies are about 1/6 of those for the adaptive random walk Metropolis. For ECT, the 
best approach for adaptive independent Metropolis Hastings is MPi and is between 11 and 
18 times better than the best available approach (which is MP 2 ) for adaptive random walk 
Metropolis. Qualitatively similar results were obtained for K = 1000 particles. 



Table 9: SV model. Medians and interquartile range (between brackets) of the acceptance 
rates, the inefficiencies and the equivalent computing time (t x inefficiency x 1000) over 10 
replications of the stochastic volatility model using the standard particle filter and differ- 
ing adaptive Metropolis-Hastings schemes. SP = single processor, MPi = multiprocessor 
Metropolis-Hastings and MP2 = multiprocessor averaging the likelihood function. 



Algorithm 


Ac. Rate 


Inefficiency 


Equivalent Computing Time 


fj, logit(0) log(cr^) 


fi logit(0) log(o^) 


ARWM-SP 


24.5 
(0.6) 


25.47 30.20 20.00 
(16.33) (5.12) (2.80) 


13463.7 15972.9 10603.0 
(8589.8) (2655.3) (1514.0) 


ARWM-MP 2 


22.2 
(5.5) 


25.04 31.07 20.51 
(12.96) (14.02) (8.83) 


3594.5 4453.4 2930.4 
(1886.9) (1994.3) (1284.3) 


AIMH-SP 


51.6 
(0.8) 


6.45 3.46 3.08 
(7.12) (0.83) (0.07) 


3430.9 1841.7 1638.7 
(3777.9) (439.7) (39.9) 


AIMH-MPi 


52.3 
(3.3) 


3.44 3.42 3.63 
(2.13) (0.76) (0.50) 


237.8 235.3 250.0 
(147.4) (50.5) (34.6) 


AIMH-MP2 


53.6 
(3.2) 


4.40 3.52 3.15 
(6.74) (3.37) (0.56) 


627.1 504.4 451.1 
(975.4) (482.5) (75.8) 



Model selection We now use importance sampling and bridge sampling to compute 
the marginal likelihoods of the four SV models: the model with no leverage effect (p = 0) and 
no outlier effect (u = 0), the model that allows for leverage but not outliers, the model that 
allows for outliers but no leverage and the general model that allows for both outliers and 



leverage. Table [10] shows the logarithms of the marginal likelihoods of the four models for a 
single run of each algorithm. The differences between the two approaches are very small. In 
this example, and based on our prior distributions, the SV model with leverage effects but 
no outliers has the highest marginal likelihood. 

Posterior estimates of model parameters The estimated posterior means and stan- 



dard deviations of all four models are given in Table 11 



5.2 Example 2: GARCH model observed with noise 

The GARCH(1,1) model is used extensively to model financial returns, see for example, 



Bollerslev et al. ( |1994 ). In this section we consider the GARCH(1,1) model observed with 
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Table 10: Logarithms of the marginal likelihoods for four different SV models for the two 
particle filter algorithms computed using the adaptive independent Metropolis Hastings al- 
gorithm. BS and IS mean bridge sampling and importance sampling. 



Model 


Standard Particle Filter 


Partially Adapted Particle Filter 


log(pss(y)) 


log(p/s(y)) 


log(pss(y)) 


log(pjs(y)) 


SV 


-1072.9 


-1072.9 


-1072.9 


-1072.9 


SV Lev. 


-1065.0 


-1065.0 


-1065.0 


-1065.0 


SV Out. 


-1076.6 


-1076.6 


-1076.5 


-1076.4 


SV Lev. Out. 


-1069.3 


-1069.3 


-1069.2 


-1069.3 



Table 11: S&P 500 data: Estimated posterior means and standard deviations for all four 
stochastic volatility models. 



Parameter 


SV 


SV Lev. 


SV Out. 


SV Lev. Out 


Mean S. Dev. 


Mean S. Dev. 


Mean S. Dev. 


Mean S. Dev. 


A* 

4> 

T 2 

p 


-0.4329 1.2314 
0.9879 0.0097 
0.0142 0.0068 


-0.5642 0.1500 
0.9811 0.0063 
0.0106 0.0037 

-0.7608 0.0960 


-0.1786 2.2812 
0.9907 0.0086 
0.0116 0.0053 


-0.5756 0.3497 
0.9830 0.0065 
0.0091 0.0034 

-0.7652 0.0960 







Gaussian noise which is a more flexible version of the basic model. The model is 



y t \x t ~Af( Xt ,T 

2 A/Vn J2 



2^ 



^+ik t 2 +i~^(° 



2 i a 2 , 2 

<*t+l = a + " X t + l°t 

xo~AT(0,a/(l-/3- 7 )). 

The priors on r 2 and a are r 2 ~ "HA/"(100) and a ~ "HjV(IOO). The joint prior for and 7 
is uniform in the region ft > 0, 7 > 0, j5 + 7 < 1. 



It is straightforward to show that this model is fully adapted. See Appendix B.2 Instead 
of using the GARCH(1,1) model with noise we can use other members of the GARCH family, 
e.g. an EGARCH process observed with noise. All such models are fully adapted. 

MSCI UK index returns We model the weekly MSCI UK index returns from 6 
January 2000 to 28 January 2010 corresponding to 526 weekly observations shown in Figure [T} 



Table |12| compares different adaptive sampling algorithms and particles filters in terms of 
acceptance rates, inefficiencies and equivalent computing times. Medians and interquantile 
ranges are computed using 50 replication of each adaptive sampling, particle filter and number 
of particles. The adaptive sampling algorithms were run for 30 000 iterations. The first 20 
000 draws were discarded and the remainder used to compute the statistics. The update 
times for the adaptive independent Metropolis-Hastings were at 100, 200, 500, 1000, 1500, 
2000, 3000, 4000, 5000, 10000, 15000 and 20000. The initial value and proposal distribution 
for all algorithms were based on a single short run of the adaptive random walk and all results 
are for a single processor. 
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Figure 1: UK MSCI weekly returns from 6 January 2000 to 28 January 2010. 



The table shows that the fully adapted particle filter is much more efficient than the 
standard particle filter for both the adaptive random walk Metropolis and the adaptive 
independent Metropolis Hastings samplers, and that the adaptive independent Metropolis 
Hastings sampler is much more efficient than the adaptive random walk Metropolis sampler 
for both the standard and the fully adapted particle filters. In particular, the table shows that 
adaptive independent Metropolis Hastings combined with full adaptation using 200 particles 
is about four times as efficient as the adaptive random walk Metropolis sampler using the 
standard particle filter and 10, 000 particles. 



Table [13] shows the standard deviations of the simulated log-likelihood function for the 
particle filters using differing number of particles. The statistics are based on 1000 replications 
of the particle filter with the parameters fixed at their posterior means. The summary 



statistics of the posterior distribution is shown in Table 14 The table shows that the standard 
deviation of the simulated log likelihood using the fully adapted particle filter with 500 
particles is smaller than the standard deviation of the simulated log likelihood using 10, 000 
particles. 

Table [Ml is a summary of the posterior distributions of the four parameters. 
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Table 12: Medians and interquartile range (between brackets) of the acceptance rates, the 
inefficiencies and the equivalent computing time for 50 replications of the Gaussian GARCH 
model observed with noise applied to the UK index return using differing particle filters, 
number of particles and adaptive Metropolis-Hastings algorithms. A single processor was 
used for all results. 



Algorithm 


#of 
Particles 


Accept. 
Rate 


Inefficiency 


t' 2 a (3 7 




Standard Particle Filter 


ARWM 


1000 


6.90 
(0.99) 


82.75 93.79 89.87 90.84 
(15.40) (19.02) (18.98) (24.55) 


5000 


14.72 
(1.07) 


39.36 48.69 56.63 53.00 
(5.96) (15.40) (11.01) (14.48) 


10000 


17.11 

(1.35) 


37.93 43.27 50.71 47.87 
(11.42) (6.85) (9.73) (16.48) 


AIMH 


1000 


13.25 
(1.94) 


40.47 46.70 44.14 42.32 
(24.91) (26.91) (14.51) (20.83) 


5000 


29.97 
(2.23) 


8.81 10.87 11.68 11.82 
(3.49) (2.94) (2.94) (3.48) 


10000 


33.64 
(1.51) 


7.42 9.24 9.08 10.07 
(1.46) (2.25) (3.26) (6.39) 




Fully Adapted Particle Filter 


ARWM 


200 


15.35 
(1.04) 


43.82 48.42 56.42 49.56 
(8.54) (10.54) (10.07) (15.87) 


500 


17.45 
(1.23) 


36.52 43.21 52.31 48.06 
(7.25) (12.05) (13.67) (15.76) 


1000 


18.26 
(1.22) 


34.06 40.35 51.91 45.56 
(5.88) (7.70) (7.38) (17.74) 


AIMH 


200 


30.28 
(1.61) 


10.16 11.72 11.34 10.44 
(4.44) (4.39) (3.46) (3.33) 


500 


34.44 
(1.38) 


7.76 8.68 8.53 10.17 2 
(1.97) (3.27) (2.02) (5.40) 


1000 


36.19 
(1.54) 


6.61 9.51 7.90 8.01 
(2.05) (3.68) (1.66) (5.14) 



Table 13: UK MSC1 index returns: Standard deviation of the simulated log-likelihood func- 
tion at the posterior mean for standard particle filter and fully adapted particle filter using 
various numbers of particles and 1000 replications. 



# Particles 


SIR sd 


jj= Particles 


FAPF sd 


1000 


1.73 


200 


0.67 


5000 


0.71 


500 


0.43 


10000 


0.51 


1000 


0.31 



References 

Andrieu, C. and Doucet, A. (2002), "Particle filtering for partially observed Gaussian state 
space models," Journal of the Royal Statistical Society, Series B, 64, 827-836. 

Andrieu, C, Doucet, A., and Holenstein, R. (2010), "Particle Markov chain Monte Carlo 



19 



Table 14: Su mmary of statistics of the posterio r distribution. 



Parameter 
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St.Dev. 
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0.0412842 



methods," Journal of the Royal Statistical Society, Series B, 72, 1-33. 

Andrieu, C. and Roberts, G. (2009), "The pseudo-marginal approach for efficient Monte 
Carlo computations," The Annals of Statistics, 37, 697-725. 

Atchade, Y. and Rosenthal, J. (2005), "On adaptive Markov chain Monte Carlo algorithms." 
Bernoulli, 11, 815-828. 

Beaumont, M. (2003), "Estimation of population growth or decline in genetically monitored 
populations," Genetics, 164, 1139. 

Bollerslev, T., Engle, R. F., and Nelson, D. (1994), "ARCH models," in HANDBOOK 
OF ECONOMETRICS, eds. Engle, R. and McFadden, D., Amsterdam: Elsevier, vol. 4, 
chap. 49, pp. 2959-3038. 

Cappe, O., Moulines, E., and Ryden, T. (2005), Inference in Hidden Markov Models, New 
York: Springer. 

Carpenter, J. R., Clifford, P., and Fearnhead, P. (1999), "An improved particle filter for 
non-linear problems," IEE Proceedings on Radar, Sonar and Navigation, 146, 2-7. 

Chen, M. H. and Shao, Q. M. (1997), "On Monte Carlo methods for estimating ratios of 
normalizing constants," The Annals of Statistics, 25, 1563-1594. 

Del Moral, P. (2004), Feynman-Kac Formulae: Genealogical and Interacting Particle Systems 
with Applications, New York: Springer. 

Del Moral, P., Doucet, A., and Jasra, A. (2006), "Sequential Monte Carlo samplers," Journal 
of the Royal Statistical Society Series B, 68, 411-436. 

Doucet, A., de Freitas, N., and Gordon, N. (2001), Sequential Monte Carlo Methods in 
Practice, New York: Springer. 

Doucet, A., Godsill, S., and Andrieu, C. (2000), "On sequential Monte Carlo sampling meth- 
ods for Bayesian filtering," Statistics and Computing, 10, 197-208. 

Fearnhead, P. and Clifford, P. (2003), "On-line inference for hidden Markov models via 
particle filters," Journal of the Royal Statistical Society Series B, 65, 887-899. 

Fernandez- Villaverde, J. and Rubio- Ramirez, J. (2007), "Estimating macroeconomic models: 
a likelihood approach," Review of Economic Studies, 74, 1059-1087. 



20 



Flury, T. and Shephard, N. (2008), "Bayesian inference based only on simulated likelihood: 
particle filter analysis of dynamic economic models," http://www.economics.ox.ac.uk/- 
Research/wp/pdf/paper413.pdf. 

Fruhwirth-Schnatter, S. and Wagner, H. (2006), "Auxiliary mixture sampling for parameter- 
driven models of time series of counts with applications to state space modelling," 

Biometrika, 93, 827-841. 

Fruhwirth-Schnatter, S. and Wagner, H. (2008), "Marginal likelihoods for non-Gaussian mod- 
els using auxiliary mixture sampling," Computational Statistics and Data Analysis, 52, 
4608-4624. 

Geweke, J. (1989), "Bayesian inference in econometric models using Monte Carlo integra- 
tion," Econometrica, 57, 1317-1339. 

Giordani, P. and Kohn, R. (2010), "Adaptive Independent Metropolis-Hastings by Fast Es- 
timation of Mixture of Normals," Journal of Computational and Graphical Statistics, see 
http://pubs.amstat.org/doi/abs/10.1198/jcgs.2009.07174. 

Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993), "A novel approach to non- 
linear and non-Gaussian Bayesian state estimation," Radar and Signal Processing, IEE 
Proceedings F, 140, 107-113. 

Haario, H., Saksman, E., and Tamminen, J. (2001), "An adaptive Metropolis algorithm," 
Bernoulli, 7, 223-242. 

Kim, S., Shephard, N., and Chib, S. (1998), "Stochastic volatility: Likelihood inference and 
comparison with ARCH models," Review of Economic Studies, 65, 361-393. 

Kitagawa, G. (1996), "Monte Carlo filter and smoother for non-Gaussian non-linear state 
space models," Journal of Computational and Graphics Statistics, 5, 1-25. 

Liu, J. S. and Chen, R. (1998), "Sequential Monte Carlo methods for dynamic systems," 
Journal of the American Statistical Association, 93, 1032-1044. 

Malik, S. and Pitt, M. K. (2008), "Modeling Stochastic Volatility with Leverage and Jumps: 
A 'Smooth' Particle Filtering Approach," Available at http://www.riksbank.com/upload/- 
Research / Conferences / StateSpace2008 /Pitt .pdf . 

Meng, X. L. and Wong, W. H. (1996), "Simulating ratios of normalizing constants via a 
simple identity: A theoretical exploration," Statistica Sinica, 6, 831-860. 

Pitt, M. and Shephard, N. (1999), "Filtering via simulation: auxiliary particle filter," 94, 
590-599. 

- (2001), "Auxiliary variable based particle filters," in Sequential Monte Carlo Methods in 
Practice, eds. de Freitas, N., Doucet, A., and Gordon, N. J., New York: Springer- Verlag, 
pp. 273-293. 

Pitt, M. K. (2002), "Smooth particle filters for likelihood evaluation and maximization," . 



21 



Poison, N. G., Stroud, J. R., and Miiller, P. (2008), "Practical filtering with sequential 
parameter learning." Journal of the Royal Statistical Society, Series B, 70, 413-428. 

Roberts, G. O., Gelman, A., and Gilks, W. R. (1997), "Weak convergence and optimal scaling 
of random walk Metropolis algorithms." Annals of Applied Probability, 7, 110-120. 

Roberts, G. O. and Rosenthal, J. S. (2007), "Coupling and ergodicity of adaptive MCMC," 
Journal of Applied Probability, 44, 458-475. 

- (2009), "Examples of adaptive MCMC," Journal of Computational and Graphical Statis- 
tics, 18, 349-367. 

Shephard, N. (2005), Stochastic Volatilty: Selected Readings, Oxford: Oxford University 
Press. 

Shephard, N. and Pitt, M. (1997), "Likelihood analysis of non-Gaussian measurement time 
series," 84, 653-667. 

Silva, R., Giordani, P., Kohn, R., and Pitt, M. (2009), "Particle filtering within adaptive 
Metropolis Hastings sampling," Http://arxiv.org/abs/091 1.0230. 

Smith, J. and Santos, A. (2006), "Second-Order Filter Distribution Approximations for Fi- 
nancial Time Series With Extreme Outliers," Journal of Business and Economic Statistics, 
24, 329-337. 

Storvik, G. (2002), "Particle filters for state-space models with the presence of unknown 
static parameters," IEEE Transactions on Signal Processing, 50, 281-290. 

West, M. and Harrison, J. (1997), Bayesian Forecasting and Dynamic Models, New York: 
Springer- Verlag, 2nd ed. 



Appendices 



A Proof that the AISR likelihood is unbiased 

This appendix proves Theorem [T] using an iterated expectations argument on the simulated 



likelihood. A similar result is obtained in Proposition 7.4.1 of Section 7.4.2 by Del Moral 



(2004) by showing that the difference of the measure on the states induced by the particle 
filter and that of the limiting Feyman-Kac measure is a martingale. We believe that our 
proof which deals specifically with the unbiasedness of the simulated likelihood is simpler 
and more direct and is accessible to a much wider range of readers. 



22 



Before giving the proof we define some terms that are used in Algorithm [TJ 

M 

PMi x t\yi:t) = ^2^5(x t - x k ), where rr k is given in Step (4). 

k=l 

M 

ffAifatlyia+i) = ^2^ lt+1 5{x t -x k t ), where x k t ~ jfa{x t \y lrt ), 



k=l 

9M(x t \yi:t) = J g{x t \x t -i,yt)gti{xt-i\yi:t)dxt-i, 

/ \ / I \ / \ p{yt+i\x t+ i)p(x t+1 \x t ) 

<*H\t+i{x t ) =g(y t+1 \x t )ir t , u t+1 (x t+1 ; x t ) - 



g(yt+i\xt)g(x t+ i\x t ,y u 



The term PMi x t\yi-.t) is the empirical filtering density arising from step 4 of Algorithm [TJ 
The second term 'g^ t (xt\yi : t+i), is the empirical "look ahead" approximation drawn from in 
step 2. The expression g^M(%t\yi:t) is the filtering approximation which we draw from in step 
3 (integrating over step 2). Furthermore, we have that in Algorithm [TJ u k t+1 = u t \t+i(x k )n k 
and u k +1 = u t+1 (x k +1 ,x k ). 



Lemma 1. 

M 



m\y t W-,-i)\A^\ = X>(lft|a£-i)*£i, 

k=l 

where the swarm of particles at time t is At = {x k ; 7i k }. 



Proof. 



E^iytlvit-i) I At-i 

e jr <*&fLi) , At _ x 



k=l 



M 



M 

£ 

.3=1 



t-l\t 



I u t {xt : xt-i)g{xt\x t -i,yt)gM{xt-i\yi:t)dx t dxt-i < 



-i|t 



M 



utixu x k _ l )g{x t \x k _ 1 ] y t )- 



M 



k=l 

M 



{J2j=l^t-l\t( x t-l)) 

J Ut ( Xt > x t-i)9( x t\ x t-i, yt)u t -i\t{x k -i)dx t 



t-l\t 



.3=1 



k=l 



§ / ^SS§S^ 9(i,|a;ti ' ! '' )s(! '' |a;t>tirfi '' 
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so 

M 



E\p A (y t \yi:t-i) I At-i] = J2 n t-i P(yt\xt)p{x t \xti) dx t 

k=l J 

M 

= 52p(yt\xU)4-i- 



k=l 



Lemma 2. 

M 



E^ A {y t _ h:t \y 1:t _ h _ 1 )\At- h -i\ = ^2p(Vt-h*\xt- h -M-h-i 

k=i 

Proof, (by induction) 

A. Show true for case h = 1, 

E\p A {y t -l:t\yi:t-2)\At-2] = ^[^ 4 (^bl:^l)r 4 (^-lbl:t-2)|A- 2 ] 

= E [£[p A (y t |yi :t _i) | At-i] p A (y t -i\yi:t-2) 

The inner integral is, 

M 



S[p A (y t |yi :t -i) | A t -!} = J2p(yt\xt-i)4-v, 



k=i 
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from Lemma [T] Hence, 

E\p A (y t _ 1:t \y 1:t ^ 2 )\At-2} 



E 



E 



E 



k=l 

M 



i=l 



t-1 



k=l 



k=l 



M 

E 



M n* 1 

.i=i 



t-2|t-l 



i=i 



M 

E 

.i=i 



t-2|i-l 



-2]yt-ijgM{xt-2\yi-.t-i)dxt-idx t - 



yX-2it-i r / y^. ^(ytlgt-iM-i^i; ^-2)^-11^2; y*-i) V 2 
i= i J ^ E,=i< 2 | t -i 

^ yM U [ k 

= z^i 7 ^- 2 / p(yt\xt-i)p{yt-i\xt-i)p(xt-i\x t _ 2 )dxt-i 

EM 
, , p(yt-i-.t\x t _ 2 )7i t _ 2 as required. 
fc=i 

B Assume that the theorem holds for h, 

M 

E\p A {y t -h:t\yi:t-h-i)\A t -h-i] = ^2p{y t ~h:t\x k t _h-i)^t-h-i 



■dxt-i 



k=l 



C Show that the theorem holds for h + 1 : 



E[p A (yt-h-l:t\yi:t-h-2)\A t - h - 2 ] 

E [E^fa-hitlyiit-h-i) I A-h-i] p A {y t -h-i\yv.t-h-2) I A-h-2] 



E 



M 



J2p(yt-h:t\Xt-h-l)^-h, 



k=l 



M i M 

E u t-h-l j I a 

M Z^ U t-h-2\t-h-l I ^t-h-2 
t=l i=l 
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using Lemma [TJ 



E 



M 

k=l 
M 

x < ^Z u i-h-2\t-h-i 

3=1 

M 



t-h-l 



t-h:t\X t _ h _u M . 

l^,i = \ U t-h-l ) \ ,:_:.! 



M 1 



t-h-2 



E 



^^T,p{yt-h-.t\x k t-h-M-h-i I A 



t-h-2 



k=l 



M 



^t-h^t-h-l 

.3=1 



M 



. j=i 



-ft-2|i-ft.-l 



p(j/t-fe:tkt-/i-i)w t _ft-i(a; t _fe_i; xt-h-v) 



g(x t -h-i\xt-h-2]yt-h-i)gM( x t-h-2\yi-.t-h-i)dxt-h-idx t - h -2 

M 



E 



t— ft.— 2|t— ft. 



^(ic^-h-i l^-/,-a; y*-i) 



g(y t - h -i\x^_ h _ 2 )^_ h _. 



dx t _ h . 



EM j 
3=1 U t-h-2\t-h-l 

V^ M k f k 

l^^t-h-2 \ p(yt-h*\Xt-h-l)Ut-h-l{Xt-h-l,tf-h-2) 



»(aJt-h-ikf_/^ 2 ;y*-i)»(j/t-A-il a; t-fc-2) da: t-fc- 



using the definition of u t -h-i (see step 4 of Algorithm), 

r-^M b f k 

= 2^ k=1 n t-h-2 / p(m-ft:tF*-ft-l)p(m-ft-lkt-ft-l)p(^-ft-lFt-ft-2)^ 

= 2^ fc= i P ^ t - ,l - 1: *l x *-' l -2) 7r t-^-2 as squired 



Proof o/[I} As a consequence we have the lemma that, with h = t — 2 

M 

E[p A ( 2/l ,)|A ] = ^p( 2/l ,|xS) 
where Xq ~ p(xo) and 7Tq = 1/M, 



fc=i 



7T n 



E 



M 



^2p(yi-.t\4)4 



k=l 



p{yi:t\xo)p(x )dx Q =p(y 



l:t 



as required. 
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B Fully and partially adapted particle filter 
B.l Partially adapted particle filter 

The partially adapted particle filter used in our article is described in general as follows. We 
omit dependence on unknown parameters for clarity. Suppose that p(x t +x \x t ) ~ Af(fi(x t ), 
and p{y t+1 \x t+ i) is log-concave as a function of x t +x- Let £(x t +i) = \ogp(y t+ i\x t+ i), X(x t+ i,k) = 
£{x t+ i) + logp(x t+ i\xt) and let 



x k +1 = arg max X(x t+1 , k), T, k +1 

Xt + l 



d 2 X(x t +i,k) 
dxt+idxt+i 



-1 



xt+i=xl 



Then, we take g(x t+ i\x k ,y t+1 ) = N(x t+1 ;x k +1 ,T, k +1 ) and g(y t+1 \x^) (x p(y t+ i\x k +1 )p(x k +1 \x k ) det(E k +1 ) 
This is obtained from the second order approximation 

X(x t+1 , k) = X(x t+1 , k) + log (det(Ef +1 )s) - i(art+i - x k t+1 )' (± k t+ ^j * (x t+1 - x k t+l ) - log (det(S t fc +1 ) 

The mode x k +1 is obtained by Newton- Raphson iteration with the starting value fi(x k ), or 
some problem specific starting value as in the binomial example. 

= 



An alternative iterative scheme to obtain x k +l is based on solving d\(x)/dx 
d£(x)/dx — Y*(x k )~ l (x — fi(x k )). The iteration is given by 

x k t+l = fi(x k ) + E(x k )d£(x k +1 )/dx k +1 . 



(16) 



A single iteration of (16) is usually faster than a single iteration of the Newton- Raphson 



scheme but the actual speed of convergence depends on the problem. In practice, we can 
make the iterations to the mode faster by just taking a fixed small number of steps of either 



the Newton- Raphson or (16), or by making the convergence criterion less strict. If we take a 



fixed number of steps then the iterations can be vectorized over k. 



Binomial example For the binomial example discussed in Section ^2 we use the 
Newton-Raphson iteration with starting value fi(x k ) or logit(?/ t /m) if m is large. 



B.2 Fully adaptive particle filter 

Full adaptation is possible whenever p(x t +i\x t ) is conjugate in x t +i to p{yt+x\xt + x / 



Gaussian observation equation Suppose that the observation equation is Gaussian 



with p(yt\x t ) ~ N( H t x t , Vt) and the state transition equation is the same as in Section |B3 
Then, from Section 



B.l 



x 



t+x 



and are obtained explicitly as 



n t+x 



{H' t+l v t - + \H t+1 + i:{x k )- l Y l 



't+X 



£m (v t ;iy i+1 + £(x 4 fc rV(^)) 



and p(yt+x\xt) * s obtained as in Section B.l 
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Garch model We use the notation in Section It is straightforward to show that 
p(y t+1 \x t ,cr 2 ) ~ N(0,a 2 +1 + r 2 ) and that p(x t+ i\yt+i,x t ) ~ N(a t +i, A m ), where 

A f +i = (r 2 )- 1 + {a 2 t+l Y\ a t+1 = A t+l y t+1 /r 2 . 

C Adaptive sampling schemes 

This appendix describes the two adaptive sampling schemes used in the paper. 



C.l Adaptive random walk Metropolis 



The adaptive random walk Metropolis proposal of Roberts and Rosenthal (2009) is 



qj(0; 0j_i) = uij(j>d{0; Oj-i, «iEi) + uj 2j (fid(9; n 2 T, 2j ] 



(17) 



where d is the dimension of 9 and 4>d{9] 9, E) is a multivariate d dimensional normal density in 
9 with mean 9 and covariance matrix E. In (17), U\j = 1 for j < j , with j representing the 
initial iterations, oj\j = 0.05 for j > j with u 2 j = 1 — K\ = 0.1 2 /<i, k 2 = 2.38 2 /<i, Ei is a 



constant covariance matrix, which is taken as the identity matrix by Roberts and Rosenthal 



(2009) but can be based on the Laplace approximation or some other estimate. The matrix 



Ti 2 j is the sample covariance matrix of the first j — 1 iterates. The scalar K\ is meant to achieve 
a high acceptance rate by moving the sampler locally, while the scalar k 2 is considered to be 



optimal (Roberts et al. , 1997) for a random walk proposal when the target is a multivariate 



normal. We note that the acceptance probability (pi) for the adaptive random walk Metropolis 
simplifies to 

p(y|0j,«JM0") l {1> 



a 



min< 1 



p(y\9 j - 1 ,u j _ 1 )p(9 



3-1, 



C.2 A proposal density based on a mixture of normals 



The proposal density of the adaptive independent Metropolis-Hastings approach of Giordani 



and Kohn (2010) is a mixture with four terms of the form 



= J~] u kj9k(0\^kj) u kj > 0, for k = 1, . . . ,4 and = 1 



(19) 



k=i 



k=l 



with Afcj the parameter vector for the density gkj{9;Xkj)- The sampling scheme is run in 
two stages, which are described below. Throughout each stage, the parameters in the first 
two terms are kept fixed. The first term gi(9\\\j) is an estimate of the target density and 
the second term g 2 {9\\ 2 j) is a heavy tailed version of gi(9\\\j). The third term g 3 (9\\3j) is 
an estimate of the target that is updated or adapted as the simulation progresses and the 
fourth term g^lX^j) is a heavy tailed version of the third term. In the first stage gij(9; Ay) 
is a Gaussian density constructed from a preliminary run, of the three component adaptive 
random walk. Throughout, g 2 (9\X 2 j) has the same component means and probabilities as 
gx(9\Xij), but its component covariance matrices are ten times those of gi(9\\ij). The term 
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g 3 (6\X 3 j) is a mixture of normals and g^{d\\^j) is also a mixture of normals obtained by 
taking its component probabilities and means equal to those of g 3 (6\X 3 j), and its component 
covariance matrices equal to 20 times those of g 3 (9\\ 3 j). The first stage begins by using 
gi(9\Xij) and f?2 I -^2^) only with, for example, = 0.8 and u) 2 j = 0.2, until there is a 
sufficiently large number of iterates to form g 3 (6\X 3 j). After that we set = 0.15, u^j — 
0.05, oj 3 j = 0.7 and = 0.1. We begin with a single normal density for g 3 (0\X 3 j) and as 
the simulation progresses we add more components up to a maximum of six according to a 
schedule that depends on the ratio of the number of accepted draws to the dimension of 9. 

In the second stage, gi{6\\ij) is set to the value of g 3 {0\X 3 j) at the end of the first stage 
and g2{d\\2j) and g±(9\\4j) are constructed as described above. The heavy-tailed densities 
92(0\\2j) and g^OlX^j) are included as a defensive strategy to get out of local modes and to 
explore the sample space of the target distribution more effectively. 

It is computationally too expensive to update g 3 (8\\ 3 j) (and hence ^4(6*^4^)) at every 
iteration so we update them according to a schedule that depends on the problem and the 
size of the parameter vector. 



C.3 Proof of the convergence of the adaptive independent Metropo- 
lis Hastings sampling scheme 

The following convergence results hold for the adaptive independent Metropolis Hastings 



sampling scheme described in Appendix C.2 (and more fully in Giordani and Kohn (2010)) 
when it is combined with the ASIR particle filter. They follow from Theorems 1 and 2 of 



Giordani and Kohn (2010). Let G be the parameter space of 9. 



Theorem 2. Suppose that there exists a constant < C < 00 that does not depend on 
t = 1, . . . ,T,9 & Q and the number of iterates j such that 



g{y t +i\x t ;9) < C, 
p(y t+1 \x t+ i;6)p(x t+ i\x t ;l 



<C, 



9(yt+i\xt;0)g(x t+1 \y t+1 ,x t ; 

p(e)/ qj (e) < c . 



(20) 
(21) 
(22) 



Then, 



1. The simulated likelihood is bounded uniformly in 6 6 Q. 

2. The iterates 8j of the adaptive independent Metropolis Hastings sampling scheme con- 
verge to a sample from p{6\y) in the sense that 



sup I Pr(#j e A) — p(6 I y)d6 \ — > as j — > 00. 



(23) 

Ace J a 

for all measurable sets AofQ. 
3. Suppose that h{6) is a measurable function of 6 that is square integrable with respect to 
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the density g2- Then, almost surely, 



-J>(0i)^ / h(6)p(6\y)dB 

U .7=1 J 



as n — > oo. 



(24) 



Proof. 



T-l 



Ps(y\9,u) = Y[ps(yt+i\yi:t;0,u) < C 2T because p s (yt\yi:t-i]0) < C 2 



t=o 



from ^ and our assumptions. This shows that the simulated likelihood ps(y\9, u) is bounded 
and the result now follows from Giordani and Kohn (2010). □ 



We note that as in Giordani and Kohn (2010) it is straightforward to choose the proposal 



density qj{6) as a mixture with one component that is at least as heavy tailed as p{6) to 
ensure that (22) holds. 



The next corollary gives a condition for equations (20) and 21 to hold for the standard 
particle filter and the fully adapted particle filter. 

Corollary 1. Suppose that for all y t ,x t ,t = 1,...,T and 9 G O, there exists a constant 
C\ > such that 



p(y t \x t ;0) < C x . 



(25) 



Then equations (20) and (21) hold for the standard particle filter and the fully adapted particle 
filter. 



Proof. We have 



p(y t+1 \x t ;9) = / p(y t+ i\xt+i;9)p(xt + i\x t ;9)dxt + i < C x 



and the result follows for the standard particle filter and the fully adapted particle filter. □ 

We note that usually p(y t \x t ; 9) is uniformly bounded in y t , x t and 9 for t — 1, . . . , T. This 
is true for the models in Sections H] and [5l 



We now construct a partially adapted particle filter that satisfies equations (20) and (21). 



Suppose that g (y t+ i\x t ; 9) and go(x t+ i \ yt+i, x t ; 9) correspond to a partially adapted particle 
filter which we refer to as go, e.g. the partially adapted particle filter described in Section 



B.l Let < e < 1. Now construct the partially adapted particle filter g as a mixture taking 
the value go with probability 1 — e and being the standard particle filter with probability e. 
That is, 



g(y t+ i\xt]9)g(x t+1 \xt,y t+ i;9) = ep(x t+1 \x t ) + (1 - e)g (y t+ i\x t )go(x t+1 \x t ,y t+1 ; 



(26) 



by equation (26). Then, equations (20) and (21) hold 



Corollary 2. Suppose equation (25) holds and the partially adapted particle filter is defined 



The proof is straightforward. 
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Usually, we would take e quite small so that most of the time the partially adapted 
particle filter go is used. Using the mixture partially adapted particle filter ensures that 
the simulated likelihood is bounded which is important to successfully use the adaptive 
independent Metropolis Hastings to sample the parameters. 

D Marginal likelihood evaluation using bridge and im- 
portance sampling 

Suppose that q{9) is an approximation to p{9\y) which can be evaluated explicitly. Bridge 



sampling (Meng and Wong, 1996) estimates the marginal likelihood as follows. Let 



tW= (KM + 5(e) )- 1 , 

where U is a positive constant. Let 

A = j t(6)q(6)p(6 | y)dO . Then, 

a r (27) 

A = —\ where A x = / t(9)q(9)p(y I 9)p(9)d9 . 

p{y) J 

Suppose the sequence of iterates {9^\j = 1, . . . , M} is generated from the posterior density 
p(9\y) and a second sequence of iterates {9^ k \ k — 1, . . . , M} is generated from q(9). Then 

1 M , k ^ 

j=l k=l A 

are estimates of A and A\ and Pbs(v) is the bridge sampling estimator of the marginal 
likelihood p(y). 

In adaptive sampling, q(9) is the mixture of normals proposal. Although U can be any 
positive constant, it is more efficient if U is a reasonable estimate of p(y). One way to do 
so is to take U = p{y\9*)p(9*) j q(9*) , where 9* is the posterior mean of 9 obtained from the 
posterior simulation. 

An alternative method to estimate of the marginal likelihood p(y) is to use importance 
sampling based on the proposal distribution q{9) ( |Geweke , 1989 Chen and Shao, 1997). That 
is, 

K . p (y\eW)p(eW) 



K ^ q(0W) ' 

Since our proposal distributions have at least one heavy tailed component, the importance 
sampling ratios are likely to be bounded and well-behaved, as in the examples in this paper. 
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E Implementation details 

We coded most of the algorithms in MATLAB, with a small proportion of the code written 
using C/Mex files. We carried out the estimation on an SGI cluster with 42 compute nodes. 
Each of them is an SGI Altix XE320 with two Intel Xeon X5472 (quad core 3.0GHz) CPUs 
with at least 16GB memory. We ran parallel jobs using up to eight processors and MATLAB 
2009. 



32 



